COCOBISI BiSDMS 



ID 137 351 
AOTHOl 

PDB DUE 
NOTl 

EDES PBICl 
DESCBIEIOIS 



m 006 1S2 

Hucter^ Join B#s Soifflidt, frank X. 

Falrnass of Selection Testae k Critical inalyais* 

Professional Series 76^5* 

Civil Sar¥ice COfflmission^ Hasiiington, D.C* Personnel 
Measurement Research and DsYelopment Canter. 
Sep 76 



MP-'SO^aa HC-$2*06 Pins Postage. 

^Culture Free Tests; ^Bthicsi *Minority GroupSi 

^Personnel Selection! *statlstical laaljsia; *3est 

Bias 



ABSTBICT 

ThQ first section of this paper defines three 
incompatible ethical positions in regard to the fair and unbiased use 
of psychological teste for selection in minority and majority groups, 
llso in this section^ five statistical definitions of "test fairness" 
are reviewed and examined critically for technical^ logical^ and 
social weaknesses. In the second section of the paper^ the various 
statistical definitions are shown to correlate with specific ethical 
positions^ and the technical, logicalj. and social problems..of each 
statistical model*ethical position combination are delineated* It is 
concluded that it is difficulty if not impossible, to predict at the 
present time which model will ultimately prove most acceptable to the 
American people, (luthor) 



* Documents acquired by ERIC include many informal unpublished * 

* materials not available from other sources. EBIC makes every effort * 

* to obtain the best copy available. Neverthelessip items of margipal * 

* reproducibility are often encountered and this affects the quality * 

* of tbe microfiche and hardcopy reproductions lEIC makes available * 

* via the EBIC Document Beproduction Service (EDBS EDBS is not * 

* responsible for the quality of the original document, leproductions * 

* supplied by EDBS are the best that can be made from the original, * 

||| III 1^ III 1^ ifl « 3|i 4llff^4i |$i 3^ 



EKLC 




ERIC 



FAIRNESS OF SELICTION TESTS: A CRITICAL ANALYSIS 



John E* Hunter 
Michigan State University 

Frank L* Sehmidt 
Personnel Research and Development Center 
U.S. Civil Service Consnlsalon 



Peraonnel Research and Development Center 
U*S* Civil Service Commission 
1900 E Street N.W,* 
Waehington^ D.C^ 20415 
September 1976 



FAIRNESS OF SELECTION TESTS ^ A CRiTl^L ANALYSIS 



ABSTRACT 



The firat section of this paper defines three incompati- 
ble ethieal positions in regard to the fair and unbiased use 
of paychological tests for selection in minority and majority 
groups. Also in this section^ five statistical definitions of 
"test fairness*^' are reviewed and examined critically for tech*- 
nicali logical J and social weaknesses , Ir. the second section 
of the paper, the various statistical definitions are shown to 
correlate with specific ethical positions, and the technical, 
logical, and social problems of each statistical model-ethical 
position combination are delineated* It Is concluded that it 
is difficult, if not impossible^ to predict at the present time 
which model will ultimately prove most acceptable to the Ameri* 
can people. 



PREFACE 



Thosa of us concernad with personnel selection and place- 
ment cannpt but be aware of the ever-- increasing Involvenient of 
the law and the courts In our professional work- This process 
has gone so far that questions and iasues\ that only yesterday 
were regarded by most in the profession as highly specialized 
and esoteric have become focal points in important j precedent- 
sat ting litigation, Tha problem of defining test falrnass given 
equal test validity coefficlanta in the relevant appllcaiit sub- 
groups is one such issue. One court decision focusing spaclfi- 
cally on this issue is now on record^ and others are certain to 
follow* Tha purpose of this publication is to provide psycholo*- 
gists, lawyers J and administrators with a thorough yet readable 
exploration of the issues and problems in this critical area* 



5 



CONTMTS 



I. STATISTICAL AND ETHICAL II^LICATIONS OF FIVE 
DEFINITIONS OF ^*TEST FAIRNESS" 

John E. Hunter and Fraric Sclroldt 

Three Ethical Poaltions 
Unqiialtfied Indtvtdualtsm 

Qualif lad Individualism and the Martt Principle 
The Quota Ethic 

Five Attempts to Define Test Fairneas Statistically 
The Cleary Definition 
The Thorndike Definition . 
Darlington's Definition No, 3 

Darlington' s Definition No. 3 and Cole' s Argument 

Darlington's Definition No* 4 

A fifth Daftnition of Test Fairness 

II, ETHICAL POSITIONS J STATISTICAL DEFINITIONS ^ AND 
^ PROBLEMS 

Frank L. Schmidt 

Unqualified Individualiami Models and Prbblems 
Qualified Indivtdualismr Models and ProblOTS 
The Quota Ethic i Models and ProblOTS 
Ethical Systems, Statistical tedels. Individual 
Merits and Social Goals 

Figure 1» A case in which the white regression line 
underpr edicts black performahce 

Figure 2* Regression artifacts produced by unreliability 
in a Cleary--def ined ^'unbiased" teat. A is the 
common regression line for a perfectly reli- 
able test. . B and C are the regression lines 
for whites and blacks respectively for a test 
of reliability .50, 

Figure 3. Darlington's (1971) method of altering the 
data to define a *^culturalLy optimal" test* 

APPENDIX 

FOOTNOTES 

REFERMCES 



6 



I. STATISTIG4L\AND ETHICAL lOTLlCATlONS OF 
FIVE Di^INITIONS OF "TESr FAIRNESS" 



John E* Hunter 
Michigan State University 

Frarit L. Sctotdt 
Personnel RaseaFch and Developmant Center 
U.S. Civil Service Goomission 



In the last saveral years there has been a series of papers 
devoted to the question of the fairness of ©^loyment and educa* 
ti©nal tests to minority groups, (Cleary, 1968| Thorndikes 1971| 
Darlington, 1971). Although each of these papers c^e to an ethi- 
cal conclusion, the basis for that ethical judgment was left un^ 
clear* If there were only one ethically defensible position re*» 
gardlng test fairness, then this would pose no problem* But such 
Is not the case* The papers which we shall review have a second 
coEsnon feature. Each writer attempts to establish a definition 
d£ test fairness on purely statistical grounds, l*e., on a basis 
that is independent of the content of test and criterion and which 
makes no axplicit as smipt ion about the causal explanation of the 
statistical relations found. We will argue that this merely makes 
the substantive considerations tmplicit rather than eKpliclt. 

In this paper we first describe three distinct ethical posi- 
tions. We will next examine five statistical definitions of test 
fairness in detail and show how each is based on one of these ethi- 
cal positions. Finally, we shall examine the technical, social, 
and legal advantages and disadvantages of the various ethical po- 
sitlons and statistical definitions* 



Three Ethical Positions 



Ungual 1 f _i e d_ I nd iy 1 dua 1 1 sm 

The first ethical position we shall examine, unqualified in- 
dividualisaiy defines a fair selection, promotion, or admissioni 
policy as one which uses the bast statistical information availffij* 
ble * and all of that information - to predict each candidate's 
future performance and then selects or admits those with the high- 
est predicted performance. 



1 



2 



From thla point of view, there are two ways In which an xn- 
stltutlon could act unethically. First tt niight knowingly fall 
to use an available more valid predictor, e.g., It might select 
on the basis of a candidate • a appearance rather than his scores 
on a valid ability teit. Secondly, tt might daliberately omit 
a valid predictor that Is known to be available, e.g., It might 
exclude. C£or trivial reasons) a valid predictor from the regres- 
sion equation. If race, sex, or ethnic group membership is, m 
fact, a valid predictor of performance in a given situation, over 
and above the effects of other measured variables, the unquali- 
fied individualist Is ethically bound to use such a predictor. 

Qualified Individualism and the Mer it Prlnclpla 

This ethical position differs from unqualified Individualism 
m that It specifically forbids the use of illegal or unconstitu- 
tional predictors, no matter how valid. If, in a given situation, 
race Is In fact a valid predletor of performance, I.e., the dit- 
ference between the race, on the criterion Is greater^ than would 
be predicted from the bast measures of individual qualifications 
available, then use of race to predict future job performance is 
forbidden. Race cohatitutea an Illegal predletor, and its «se 
would be discriminatory. To the unqualified ^'^dlvlduaUst on the 
other hand, failure to use race as a predictor would be unethical 
and discriminatory, since It would result in a less accurate pre- 
diction of the future performance of appllcantB and would Pena- 
lize'* or underpredlct performance of individual s from one of the 
applicant groups. Unlike the unqualified 

fled Individualist relies solely on measures Of ability ^and moti- 
vation to perform the Job, e.g., scores on valid aptitude and 
achievement tests, asaasoments of past work experiences, etc. 

The quota Ethic 

Most corporations and educational instltutlona-are creatures 
of the state or city in which they function.^ Thus, it has been 
argued that they are ethically \iound to act in way which la "po- 
litically appropriate" to their' locetlon. In par ttr.ular, in a 
city whose population is 45 percent black and 55 percent jhlte, 
any selection procedure which admits any (jther ratio of blacks 
and whites la "politically biased" against one group or the other. 
That is, it Is assumed that ahy politically well defined group 
has the right to ask and receive Its "fair share" of any desirable^ 
product or poBltlon which Is under stata control. These fair share 

■ . .8 , 



ERIC 



quotas may be based on population parcentages or on other f ac* 
tors irrelevant to predicted future performance of selectees 
(Xhorndike^ 1971| Darlingtonp 1971)/ 

Five AttOTpts To Define Test Fairness Statistically 

In this section we will briefly review five attempts to ar« 
rive at a statistical criterion for a fair or unbiased test. For 
ease of presentation^ the discusston will be in terms of compar-' 
ihg blacks and whites. However ^ the reader should bear in mind 
that other demographic classifications, such as social class or 
sex, could be eubstltuted with no loss of generality* 

The Cleary Definition 

Cleary (1968) defined a test to be "unbiased" only if the re** 
gresslon lines for blacks and whites are identical. The reason 
for this is brought out in Figure I, which shows a hypothetical 
case in which the regression line for blacks lies above the line 
for whites and Is parallel to it. Consider a white and a black 
subject who each have a score of A on the test. If the white re* 
gression line were used to predict both criterion scores, then the 
black applicant would be underpredicted by an amount the dif- 
ference between his escpected score making use of the f4at h6 is 
black and the expected score assigned by the white regression line. 
Actually in this situation^ In order for a white subject to have 
the same expected performance as a black whose score is A, the 
white subject must have a score of B, 

That is, if the white regression line underpx^dictslblack 
performance! then a white and black are only truly equal in their 
expected performance If the white's test score Is higher than. the 
black's by an amount related to the amount of underpredictlon* 
Similarly, if the white regression line always overpredicts black 
performance, then a black subject has equal expected oerformance 
only if his test score is higher than the corresponding white sub- 
ject's score by an amount related to the amount of overpredlctlon. 
If the regression lines for blacks and whites are not equal, then 
each person will receive a statistically valid predicted criterion 
score only if separate regression lines are used' for the two races. 
If the two regression lines have exactly the sMe slope, statis- 
tically unbiased prediction could be accomplished by predicting 



5 



performance from two separate regreaaion equations or from a mul- 
tiple regrassion equation with test score and race as the predic- 
tors* If the slopes are not equal, then either separate equations 
must be used or the multiple regression equation must be expanded 
by the usual product tera for moderator variables. We can there* 
fore view Cleary's definition of an unbiased test as an attempt 
to rule out disputes between qualified and unqualified individual* 
ism. If the predictors available to an institution are "unbiased'' 
in Cleary ' s senses then the question of whether or not to use race 
as a predictor does not arise. But if the predictors are "biased," 
the recommended use of separate regression lines is clearly equiva- 
lent to using race as a predictor of performance* Thus while Cleary 
(1968) may show a preference for testa that meet the requirCTents 
of both unqualified and qualified individualism^ her position is, 
in the final analysis, one of unqualified individualism, 

A Cleary-defined "unbiased" teat is ethically acceptable un- 
der the population-based quota ethi^ only under very special cir- 
ci^stances. In addition to identical regression lines, blacks and 
whites must have equal means and equal standard deviations on the 
test, and this in turn implies equal means and standard deviations 
on the performance measure. Furthermore, the proportion of black 
and white applicants must be the ame as their proportion in the 
relevant population. These are conditions that rarely obtain* 

Linn and Werts (1971) have pointed out an additional problem 
for Cleary 'a definition; The problem of defining fairness when 
using less than perfectly reliable teats. Suppose that a perfeculy 
reliable measure of ability were in fact an unbiased predictof in 
Cleary *s sense. But since perfect reliability is unattainable in 
practice^ the teat actually used will contain a certain amount of 
error variance. Will the imperfect test be "unbiased" in terms of 
the regression equations for blacks and whites? If black applicants 
have a lower mean score than white applicants, then the regression 
lines for the imperfect test will not be equal. This situation la 
illustrated in Figure 2* In this figure we see that if an unreliable 
test Is usedi then that teat produces the double regression line of 
a "biased" test in which the white regression line over-predicts 
black performance. That is, by Cleary 'sdefinitioni the unreliable 
test is "biased" against whites in favor of blacks* lj2 

One may be inclined to question whether failure to attain per-- 
feet reliability - impossible under any circimstances - should be 
adequate grounds for labeling a test as biased. But suppose we 



11 



Parformance 



j/^ ^-^^ B (whites ) 

.... 

j/^ C (blacks ) 

■ ^^^^^ 



J, 



Test Score 



Black 
mean 



White 
mean 



Figure Z- 



Regres8lon"~ittlfacts produced by unreltabllity in a 
Cleary-defined "unbiased" test. A is the conmon re- 
gression line for a perfectly reliable test. B and 
C are the regression lines for whites and blacks re- 
spfsctlvely for a test of reliability .50. 



12 




from a different viewpoint. : Supposa/ there 
|^g^5gfQ;^:3S only one ethnic group, whites, for^ example. /Assume that Bill 
?^Kis7 a txue^ of 115 arid Jack' s ^trufe flebre is^^llO* If 



^^^p>Sj^gSt^^^ valid predictor of perforM^ca^:^in^^^ 

CllSSfi^'^^^^^^ has the higher expected performance ^ and If a perfectly 



reliable test is used. Bill .will Invariably be admitted ^ ahead of 
Jack.v But suppose that the reliability;^ ts jbnly .- 50 » Then i the two 
obtained scores will - each vary ^randomly fr6m ^their v true values and 
friii there is some probability that Bill^^ g will be randomly low while 

Jack's is randomly high, i.e., some probability that Jack will be 
admitted ahead of Bill. If the standard/deviatibn of rthe^ observed 



ability scores' is 15, then the . difference between vtheir^^^^^^ 



scores has a mean of _ 5 and a standard deviation of . 21- Vi .The proba-- 



bility of a negative difference Is then *41v- ^ 



ity that Bill will be admitted ahead of Jack drops; from 1*00 tO; 
*59^ The unreliable test is in fact sharply "biased?^ agai^t b^ 
ter qualified ■ applicants. : It is bbvibuSj how<^er, that thi^ vbias^ 
is not :directly racial or cultural in ^nkturei j When there are 
bl a ck 8 and whi t e s in the app I leant' poo I iv i t ^ take s : on the app ear- 
ance of a racial bias because the proportion of ^ better qualifie 
applicants is higher in :the white: gr the bias created by 

random error/ works against more applicants ' in the; maj or ity group , 
and on balance the ^Cest is "biased" ag whole* 
But at thei individual " level I such- a^^^ 
a wall qualified white than a well qualified ^^^^ 

then , is whether ^ CI ear y 1 s : (1968 ) def ini t ion : i s def e ct ive in ' some 
sense iii labeling this situation, as ' -biased. I' If so, it may per- 
haps be desirable to modify the def inltion to applyp only to "bias" 
beyond that expected on the has is of test reliability alone. 

, - -Let us consider the comparison of ; whites and blacks on - the un- 
reliable -test in detail* -We first r^ark that it is a fact that 
on the average , the whi t es : wi th a^ given : s cor e have a ^ higher mean ' 
performance than the blacks; who have that %sam^^^^^^^ Thus the: use 

of a single regression line will ^ 'in f act ^ 'mean tltat: the whites P' - 
near the cutoff will r be denied admission in £avor>^£ blacks who w 
on the- averagei-^not perform as-wel^^^ 



would clearly label such a situation !*biased#" "Furthermore^ 
situation the partial correlation between race and perf drmance\with 
obaeryed test score held cons tant_ isv not; zero. ^^ T^ 
contribution to the multiple regression because^ with a^ 
test J race Is in fact a valid predictor of performance after test 
score is partialed out. From: the viewpoint of unqualified Individ* 
ualism, the failure to -use race; as a .second predictor is unethical* ^ 
-If— the^ test-is used -with-only-"'ohe''regression'Tline,-^then-the predictors 



are in fact "biased" against whites ae a group. If two regras- 
sion lines are used, then each perion is being considered solely 
on the basis of expected performanca- 

For this reason, we fael that this crlttcism of deary's defi- 
nition is essentially unwarranted* To usa an unreliable predic- 
tor is to blur valid differences between applicants and an unre- 
liable test is thus, to the extent of the unreliability , biased 
against applicants or groups of applicants who have high true scores 
on the predictor. Thus from the point of view of an unqualified 
individualist, an unreliable test is Indeed "biasadv" On the other 
hand, a qualified individualist would object to this conclusion. . 
Use of separate regression lines is statistically optimal because 
the;unreliable test does not account for all the real differences 
on the true scores. But the qualified individualist Is ethically 
prohibited from using race as a predictor and therefore can employ 
only a single regression equation* He can, however , console him- . 
self with the fact that the "bias" In the test is not specifically ^ 
racial In nature. And he can, of course, att^pt to raise the re- 
.liability of the test- 

Tha Thorndike Definition 

Thorndika (1971) has defined a test as fair if, and only if , 
subgroup mean differences in standard score form are equal on the 
test and the criterion.- Thorndike noticed that while using two re- 
gression lines is often the only ethical procedure from the point 
of view of unqualified individualism, it need not be required by a 
specific kind of quota ethic. In particular, if the black regres- 
sion line is lower, then normally blacks would show a lower mean 
on both predictor and criterion. Suppose that, blacks are one ^stand- 
ard deviation lower on both and that validity is .50 for -both groups 
If we knew the actual criterion scores and set *the cut-off at the 
white mean on the criterion, than fifty percent of the whites and 
sixta^jn percent of the blacks would be selected. However, if the- 
predictor^score is used with" two regrepsion -lines^ then--f if ty -per- 
^-^cent-^of-tha-whites-but-only-two-percent-of-the-^blacks^will^be^ad'^^ — 
mltted.^ Thorndike then argues that this is "unfiir" to-blacks J'as 
a .group." He then raconmends that we throw out individualism as 
a-ethical imperative and replace it with a specific kind of quota. 
The quota that ha defines as the "fair share" for each group is the 
-percentage of that group that would have been selected had. the cri- 
terion itself been used or had the test had perfect validity* In 
the above situation, for axmpla, Thorndike' s definition would con- 
sider- the- selection procedure fair -only -if -.16 percent- of .the black , 
applicants were selected* ^^^^v . ■ . 



14 



Onca Thorndike' s definition is shown to be a form of quota 
setting (c*f . also Sctaidt and Hunter, 1974), ; then the obvious 
question is '»Why, his quotas?'* After all , the statement "sixteen 
percent of the blacks can perform at the required level'? would 
not apply to the blacks actually selected and is in that sense ir- 
relevant. In any event, it seems highly unlikely that this method 
of setting quotas would find support inong those adherents of the 
quota ethic who focus on population proportiohs as the proper 
basis of quota determination# Thorndikean quotas will generally 
be smaller than population-based quotas* On the other hajid, Thorn 
dike-determined quotas may have considerable appeal to large nim- 
bers of ^erlcans as a seemingly fair compromise between the re- 
quirements of qualified individualism arid the merit principle, on 
the one hand^ and the social need to upgrade emplo^OTent levels of 
minority group members, on the other. 

There is another question that must be raised of Thorndlke's 
position: Is it ethically coi^atlble with the use of ijsperfect 
selection devices? Assume, for exan^^le, that one is using a test 
score of 50 (&50| S^IO) as a cutoff and knows from past data 
that fifty ■ percent of those at or above this test score will be 
successfulv Applicants with test scores of 49 can then-correctly 
state "if we were all admitted, then 49 percent of us would suc- 
ceed. Therefore according to Thorndikei 49 percent of us should - 
be admitted* Yet we were all denied.^. Thus, you have been unfair 
to our group, those people with scores of 49 on the test." Thus, 
strictly speaking, Thorndike's ethical position precludes the use 
of any predictor cut-o£f in selection— no matter how reasonably 
deteimined. Instead, from each predictor category one must se- 
lect- that -percentage which would fall above the criterion cut-off 
if the test were perfectly valid. For ex^ple, if one wanted to 
select 50 percent of applicants and the validity were .60, then 
he would have to take 77 percent of those who lie one standard 
deviation above the mean, f if ty p^^cent of those-within^one^SD-of- 
the mean, and 23 percent of those who fall one standard deviation 
below the-mean. — And-Thorndlke * s definition'^ could be^ interpreted;^™ 
of course,.^ as.- requiring^ the^use^of. even smaller^intervalSiOf^test-- 
scores, . . .. ." . . . . ." ■ ./ ■ 

There are at least two problems with this procedure* First, 
one must attempt to es^lain to applicants with objectively higher 
qualifications than some selectees why they ware not ataltted-jn^a ^ 
rather difficult task and, from the point of view of individual-* 
lOTi, an unethical one* Second, the general level of performance - 
will, be_considerably lower than if the usiil cutoff had been , used. ^ 



10 



In the previous example, the mean performance of the top fifty per- 
cent on the/predictor would be .48 itandard score units, while the 
mean performance of those selected by the Thorndike ethic would be 

29. That-ls, in this example uslni Thorndike' s quotas has the et- 
fect of cutting the utility of the predictor by about 60 percent. 
(These calculations are shown in the Appendix. ) 

0^'e possible reply to the criticism that the Thorndike defi- 
nition leads to a proliferation of subgroups would maintain that 
it need not be interpreted as requiring application to all defin- 
able groups. The definition is to be applied only to "leiitiinate 
minority groups," and this would exclude groups defined solely by 
obtained score on the predictor. If agreanent could be reached 
that for example, blacks, Chlcanos, and Indians are the onlyrec- 
ognized minority groups, the definition might be workable. But; 
such an agreement is highly unlikely. On what grounds could we ^ 
fairly exclude Polish, Italian, and Greek Americans, for example? 

Perhaps an even more telling criticism can be made. In a col- 
lege or university, performance below a certain level means a bit- 
ter tragedy for a student. In an employment situation, job. f^^^ 
ure can often be equally damaging to self-esteem. In the. selection 
sltuatioW described above, the rate of subsequent failure-after- ad- 
miction wbuld be one-fourth if the top half were admitted, but one- 
third if a Thorndikean admission rule were used. Furthermore^most 
of the Increase In failures comes precisely from the poor-risk ad- 
missions. Their failure rate Is two-thirda. Thus in the enfi, a 
Thorndikean rule may be even more unfair to those at the bottom of 
the -test score distribution than to those at the top, 

Darlington' s Definition No. 3 ^ 



J___JLeJt-ua-iiEs*-revi^w=^^ and, analyses in 

Darlington' s (1971) article and then consider his Definition No.3 , 
in some detail. Darlington's first step was a restatement of the 
Cleary (1968) and Thorndike (1971) criteria for-a "culturally fair ' 
test in terras^ of jiorrelation rather than regression. Again consid-^ 
ering a comparison of blacks aHd whites, ret-Xbe the predictor,- Y 
th% criterion, and C the Indicator variable for "culture" U.e., C-L 
for Whites, C=0 for blacks). Darlington made the empirically plausi 
ble assumption that the groups have equal standard deviations on 
both predictor and criterion and that the validity of the predictor 
is the same (Scteiidt, Berner, and Hunter, 1973) for both groups 
(hence parallel regression lines). Darlington then correctly noted 

16 ' ■ - 



11 



that Cleary's (1968) eriterion for a "fair" test can be stated 



5CY-X ^ ^ , , ■ ■ ■ 

That is j there is no criterion difference between the races be- 
yond that produced by their difference on X (if any). If all 
people are selected using a single regression line, then ITiQrn- 
dikean quotas are guaranteed by Darlington' s^Deflnttion No.Zg 

i»e* |- ■ ■ • ■ : \ 

^GX ^ ^GY 

Tha^t is, the racial difference on the predictor must equal the 
racial difference on the criterion in standard score units* How* 
ever I if people are selected using multiple regression or sepa- 
rate regression lines then this equation is no^t correct* In- 
stead there are two alternate condltionsi 

- ^•GX ^. ^ 

■ or "■ ... 



That is, if separate regression lines are usedj then the percent- 
ages selected match Thorndike' s quotas only if the test has per- - 
feet validity or if there are no differences between the groups 
on the criterion*^ 



Darlington then attacked the CI eary definition on two bases i 
(1) the reliability issue raised by Linn and Wert s (1971), which 
was discussed above, ^nd (2) the contention that race itself would 
be a fair test by Gleary's definition. Actually if race were taken 
as the "test," then there would be no within— group variance on that 
predictor, and hence ne regression lines to compare* Thus Cleary's 
definition cannot be applied to the case where race itself is used 
as the predictor test. 6 The nontrivial equivalent of this is a 
test whose sole contribution to predicting Y is the race: differ- 
ences on the mean of X. But for such a tests the 
are perf ectly^horiiontal"and=gros8ly discrepant^^ Tha 
real situation, Cleary' s definition would rule that a purely ra*» 
cial test is "biased*" 

Par lington^ s^ Definition No ■ 3 and Cole - a argiment « Darling ton 
(1971) proposed a third definition of test fairness, his Definition 
No*3. This definition did not attract a great deal of attention 



17 



12 



uritil Cola (19/3) offered a per^^uasive argimient in ite, favor. . 
We will first present DarllngtonJ a third aafinition, Ms justi- 
fication of its and our critique of that justification* We will 
then consider Cole's argi^ent* 

Hlf X Is the test and Y is the criterion, and C, the variable 
of "culture,** is scored 0 for blacks, 1 for whitass then Darling- 
ton^ a Definition No,3 can be written as follows: The test is 
fair if 

His argtanent for this definition went as follows i The ability 
to perform well on the criterion is a composite of many abili- 
ties, as is the ability to do well on the test* If the partial 
correlation between test and race with the criterion partialed 
out is not zero, there is a larger difference between the races 
on the test than would be predicted by ^ their difference on the 
criterion. Hence the test must be tapping abilities which are 
not relevant to the criterion but on which there are racial dif- 
ferences* Therefore, the test is discriminatory. 

Note that Darlington's argument makes use of ass^ptions 
abotjit causal inference. If those assumptions about causality 
are in fact false, then his interpretation of the meaning of 
the partial correlation is no longer valid* Are his assimptions 
so plausible that they need not be backed up with evidence? .Con- 
sider the time-ordering of his argraent. He is partialing the 
criterion from^ the predictor* In the case of college admtssionSj 
this means that he is calculating the correlatlDn^betofeen race^^ . 
and ehtranee exam score with GPA four years later being held con- 
stant* This is looking at the causal influence of the future on 
the past and is only valid in the context of very special theo- 
retical assumptions* The definition would in fact be inappro-.. 
priate even in the context of a concurr.ent validation study since 
concurrent validities are typically derived only as convenient 
estimates of predictive -validity*^-^Thus even when ..there is_no __ 
time: lag._between„pre.dictQX.^and^rlterl 

erating ImpliGltly within the predictive validity model. 

Let us explore the matter of causality more fully through 
the use of a concrete exmple* Let us consider a prof essional 
football coach att^p ting to evaluate the rookies who have Joined 
the team as a result of the college "draft. '' Since the players 
have all come from different schools, there are great differences 



18 



13 



in the kind and quality of training that they have had- TherS'^ 
fore thei coach cannot simply raly on how well they play their / 
positions at present I they will undergo considerable change as 
they laarn ^tha ropes over the next few months, l^at the coach 
would like to know Is exactly whaf^ thelr^ athletlcT ability l&i " 
without reference to how well they have learned to play to date* 
Suppose he decides to rely solely on the 40-yard dash as an in- 
dicator of Jootball ability J i.e*j as a selection test* It is 
possible that-^he will then find that he is selecting a much larger 
percentage of blacks than he had using his Jud^ent of current 
performance* Does this mean that the test discriminates unfairly 
against whites? "That depends on the explanation for this outcome* 
Consider what is required of the defensive llnOTan on a passing 
play* His ability to reach the quarterback before the ball is 
thrown depends not only on the speed necessary to go around the 
off ens ive lineman opposing him I but also on his possessing suf- 
ficient arm strength to throw the offensive lineman to one side 
(defensive linemen can use their hands)* Assume^ for the sake 
of this exmple, that blacks are faster 5 on the averagev than 
whtteSj but that there are no racial differences in upper body 
strength. Since the_4Q*y^^^ dash represents_only speed, and makes 
no measure of upper body strength, it cannot meet the requirements 
of Darlington' s definition* That is, the 4&<*yard dash taps only 
the abilities on which there are racial differences and does not 
assess those which show no such differences. 

How does the 40-yard dash behave statistically? If speed 
and upper body strength were the only factors in football ability 
and If performance on the 40"*yard dash were a perfect Index of 
speed, then the correlations would satisfy r„ „ m 0. That is, 

byClearyls definition, the 40^yard dash would be an unbiased 
test* Since r^^^.^ m 0, r^^^^ caMiot^be zero and hence, accord- 
ing to> Darlington' s definition^ the, 40-yard-dash is ''culturally 
unfair, " i*e* , biased against whites* (Since the niaaber of whites 
selected would be' fewer than the Thorndikean quota, Thorndike too 
would call -the^test^biasedi^)"^^ If-- 

body strength was a key variable and were deliberately avoiding 
the use of a measure of upper body strength in a multiple regres- 
sion equation, then the charge that the coach was deliberately 
selecting blacks would seem quite reasonable. But suppose that 
the nature of the missing predictor (i.e*, upper bodv strength) 
was completely urJcoown, Would it then be fair to charge the coach 
with using an. unfair test? 



14 



At this point we ahould note a related iisueralaed^^^ 
and Wer t s (19 71) * They too conB Ider ed the case tn which the cr i- 
tlfion Is affected by more than one ability, one of which is not 
assessed by- the. teat. If the test assessed only^^^y 
and the only racial differences were on verbal ability, then the 
situation would be like that described in the preceding paragraph: 
the test would be fair by the Cleary definition but unfair ac- 
cording to Darlington' B definition No. 3. However, if thare are 
also racial differences on the unmeasured ability, then the test 
will not be fair by Cleary' s definition. For example, if blacks 
were also : lower, oh the average, In numerical ability, and numeri- 
cal ability was not assessed by the entrance test, then the black 
regression line and the test would -be unfair to whites by Cleary " a 
definition. . According to Earllngton'sCdeflnltlon No.3, on the : ^ 
other hand, the verbal ability test would be fair If , and only if, 
the racial difference on the numerical test were of exactly the ; 
same magnitude in standard score units as the difference on the 
verbal test. If the difference on the missing ability were less 
than the difference on the observed ability, then Darlington's 
definition would label the test unfair to blacks,, while If the 
difference on the missing ability were larier than the difference 
on the observed ability then the test wou7.d be unfair to whites. 
Furthermore, If the two abilities being considered were not the 
only causal factors In the determination of the criterion (e.g. , 
if personality or financial difficulties were also correlated), 
then thesi statements would no longer ho W. Rather, the fairness j 
of the ability test under consideration would depend not only on 
the size of racial differences on the unknown ability, but on the 
size of racial differences on the other unknown causal factors as 
well. That is, according to Darlington' s definition No.3, the , 
fairness of a test cannot be related to the causal determinants 
of the criterion until a perfect multiple regression equation on 
known predictors has been achieved. Therefore, Darlington' a defi- 
nition can be atatlatlcally but not substantively evaluated in 
real, situations. 

^_.J:or.„purpo8es:_p.f_Aliu.8tr 
theory of academic achievement In coilTege. Suppose that the col- 

"lege entrance test were In fact a perfect raeasure of academic 
abnity for high school seniors. Why is the validity not perfect? 

-Consider three men of average ability i Sam meets and marries^ Won- 
der Woman. She scrubs the floor, earns 200 dollars a week", and 
worship* the ground Sam walks on. Sam carries a B average. BUI 
dates from time to time, gets hurt a little, turns off once or twice 



statistiGal definition thug dnes not fit hia substantive aasump- 
tlons in this Gont€Xt--'Unles8 one is willing to accept luck a ; 
an "ability- and treat It as any other ability would be treated* - 

^ Xh s definition becomes even clearer 

if we alter slightly the example In the above paragraph* Suppose 
. that the world became more benign and that the tendency for blacks 
to have bad luck disappeared. Then, making the same assmnptlons 
as above (l*e*| a perfect test and our theory of academic achieve-* 
ment), the regression curves would be equal ^ and r™^^ ^ 0, Thus 

according to Cleary's deflnlttonp the teat would be unbiased against 
either group* Darlington* s definition No* 3 would now label the 
teat as unfair to blacks* This last statement is particularly in- 
teresting* In our theory of achievement we have asswied that ex- 
actly the srae ability lies at the base of performance on both the ' 

test and later CPA* Yet it is not true In our exmple that r,,^ „ = 0 

- XG»Y 

Thus this example has shown that Darlington's substantive ititer* 
pretatlon of r^^ ^ does not hold with our additional assumption 

(of a non- statistical nature) i and hence his argtnnent as to the 
substantive justification of his definition is not logically valid* 

We note in passing that our modified ex™ple poses a problem 
for Gleary's definition as well as for Darlington' s* If the dif- 
ference between the regression lines were In fact produced by group 
differences in luck, then would it be proper to label the test as 
biased? ^And if this model were correct, how many unqualified In- 
dividualists would feel comfortable in using separate regression 
lines so as to take into account the fact that blacks have a tougher 
life (on the average) and hence make poorer GPA's, ability con* 
stant? In the case of both definitions, this analysis points up 
the necessity of substantive models and considerations* Statis- 
tical analyses alone can obscure as much as they Illuminate* 

As mentioned earlier, Darlington' s definition No. 3 received - 
-little at tent ion-until a-novel- and per suasive- argument In it s^f a- - 
vor was advanced by Cole (1973)* Her argument was thlsi Consider 
those applicants who would be "successful" if selected. Should 
not such individuals have equal probability of being selected re-- 
gardless of racial or ethnic group membership? Under the assinnp- 
tion of equal slopes and standard deviations for the two groups, 
the answer to her question is in the affirmatlye only if the two 



15 



on girls who like him, and ganerally has the average 
%8 and downs. J Bill carries a G ' average. Joe meetayand marries = 
Wanda the Witch.- She lies around c the house, cont 
Joe about moneyV^nd continually tells him that h Is eeKually 
inadj^uate. ;As Joe spends more and more tijne, at^^t^ local bar, ; 
his grades drop to a D average and he is eveiStual-l^^^ 
of school- ; In a nutshell, the theory oiaoademlcachtev^ 
that we wish to consider Is thisi achievement wnslsts of ability 
plus luck, where luck is a composite of money troubles i sexual 
probiemsiautomobils accidents , deaths in the family, and other 
Incldtots In personal history* Luck in this sense Is a random . 
variabla but cannot be consider ad random error, since its effects 
are stable over time. According to this 'theory i a difference be- 
tween the black and white regression lines Cover and above the^^^^^^ 
effect of tast unreliability) indicates that blacks are more likely 
to have bad luck than whites are* Before going^^n to statistical 
questions, we note that because: we have assumed a perfect ability 
test, there can be no missing abtllty in the following discussion* 
And bacause we have assumed that non-ablllty differences are solely 
determined by luck, the enttty referred to as ''motivation'^ Is In 
this model simply the concrete expression of luck In teras of overt 
behavior* That' is, in the present example, motivation is as 
to be wholly^detemlned by luck and hence already Included in the 
regression equ;atlon, - - 

Now let us cons idar' the statistical Interpretations of the 
fairness of our hypothetical perfectly valid (with respect to abil- 
ity) and perfectly reliable test. Since blacks 'ftre assumed to be ^ 
unlucky, as well as lower, on the ^erage, in measured academic \ 
ability j the racial difference in college achievOTent in this model 
will ba greater than that predicted by ability alone and h 
regression lines of college performance onto ability will not be 
equal* Thus according to Cleary the test is biased against whites. 
According to Thorndike the test is probably^ approKtoately fair 
(perhaps slightly biased against blacks). : According to Darling- 
ton, the test could be either fair or unfair. If the raplal^ dif- 
ference on luck were about : the sme .in ::magnttude as th^ racial : 

-differ&nc^^Qnthe„ ability tasty then the test would be fair. But 
If tfte racial difference on luck were less than the difference 

Ton ability, then the test would be unfair to blacks. That is, ^ 
the Darlington assessment of the fairness of the test would not 
depend on the validity of the test in assessing ability, but on 
the relative harshness of the personal-economic factors deteratn- 
ing the amount of luck accorded the two gro ■ 



17 



ragresiion lin^s of test on criterion are the same (and hence - ■ 
*^XC*Y ^ iSj Cole' 8 daf inltlon is the same as Cleary' s 

with the roles of the predictor and criterion reversed. However, 
this similarity of statement does not imply compatibility"jU8t 
the reverse- If there are differences between the races on ci- 
ther test or criterion, then the two definitions are compatible 
only if the test validity is perfect* So the two definitions 
are almost invariably in conflict* 

'Although Cole's argument sounds reasonable and has a great 
deal of intuitive appeali^ it is flawed by a hidden assimption. : 
Her definition assumes that differences between groups in proba- 
bility of acceptance given later success if selected are due to 
discrimination based on group m^Qbership* Suppose that the two 
regression lines of criterion performance, as a function of the 
test are equal Ci*e* I Tthe test is fair by Cleary's definition)* 
If a black who would have been successful is rejected while a 
white who fails is accepted, this need not imply discrimination. 
The black would not be rejected because he is black, but because 
he made a low score on the ability test* That is, the black would 
have been rejected because his ability at the ttee of the t redic* 
tor test was indistinguishable from that of a group of other peo- 
ple Cof both races) who, on the average, , would. have^^had. low scores 
on; the criterion* _ I ' ^ 

To make this point more strongly, we note that according to ^ 
Cole' s definition of a fair test, it is unethical to use a test 
of less than perfect validity* To illustrate this, consider the 
use of a valid ability test to predict academic achiev^ent in 
any one group, say whites, applying for university admission* : If 
the university decides to take only the people in the top half of 
the distribution of test scores, then parents of applicants in the 
bottom half, acting under Cole' s definition, might well file suit 
charging discriminatory practice* According to Cole, an appli- 
cant who would be successful if selected should, have the same prob*- 
_abllity^of being selected regardless of group m^bership.— That- iS|- 
-among_the^applica^ts^who-^would^hava^been.^successful had- - 
selected, there are two groups. One grouphas a probability of 
selection of 1*00 because their score on the entrance exam is higher 
than the cut-off* The other group of potentially successful ap« 
pllcants has a selection probability of .00 because their exam score 
is lower than the cut-off. According to Cole, we should asks "Wiy 
should a person who would be successful be denied a college berth 
merely because he had a low test score? After all it's success 



18 



that counts, not taat scores." But the fact la that, any sta-- 
tlstlcal procedura that does not have perfect validity i there m^at 
always be applicants who will be Incorrectly Predicted |o have iow 
performance, i.e. .there will always be successful people whose 
predictor score la. down with the senerally unauccessf ul ^people:^ t^^^^^ 
stead of up with the generally succeasful people (and vice versa;. 
In that sense, anything leas than a perfect; test will always be 
"unfair" to the potentially high achieving people who wete over- 
looked. It can be seen that lack of perfect validity functions 
In exactly the same way as test unreliability, discussed ear Ijler. 

As noted earlier in the case of Thorndlke' s deflnlU^ 
problem could be partly overcome In practice If restrictions ar- 
rived at by social corisenaus could be put on the defining of bona 
fide minority groupa." But given the almoit unlimited number 
potentially definable social groups. It Is unlikely that social 
or legal consensus could be reached limiting the application °^ 
this definition to blacks, ' Chlcanos, American Indians, and a few 
other groups- 

Baslcally Cole has noted the same fact that Thorndlke, noted i 
that in order for a test with less than .perfect validity to be^^, .^ 
"fair" to Individuals, the test oust be "unfair" to groups. In 
particular, in our example, the group of appllcanta wW^iore be- 
low average on the test will have:JMne of their manbers^^^ a^ 
despite the fact that some of them would have shown successf ul per- 
formance if selected. It Is thus "unfair" to this group. However, 
It Is "fair" to each individual, since each is selected rejec- 
ted based on the best possible estimate of his future performance. 
It is perhaps important to note that this is not a problem pro- 
duced by the use of psychological teats | It Is a probleitt inherent 
In selection declslona. Society and Its institutions "luat make _ _ 
selection decisions. They are unavoidable. EHmlnation of valitl 
psychological teats will usually mean ; their replacement with de- 
vlces or methods having validity (e.g. , the interview), thua 
further increasing the "unfairness" to individuals or groups. r 



.Darlington's Definition No«4 . ^ .1 

The fourth concept of test bias dlicursed by Darlington C 19 71) 
definea a test as fair only if r^^ ^ 0- Hence, any test „hlc*,„ 

shows any ethnic dlfferencea at all in mean score is considered 
unfair, regardlea a of the magnitude of the- group dlfferance ^ 



24 



ERIC 



19 



performance* This concept of test fatriiees corresponds direGtly 
to the athic of populatlon«*proportion-b4.sed quotas - 

A Fifth Definition of Test Fairness 7 "V^'l 

After defining arid discussing four different statistical mod- 
els of test fairnessi Darlington (1971) turned to the conmonly© e- 
currlng prediction situation in which there Is a difference fa- 
voring whites on both the test and the criterion end the black re- 
gression equation falls beJow that for whites- This situation la 
shown in Figure 3a, Noting that the use of separate regression 
equations Cor^ the equivalent , use of a multiple regresslonT equa- 
tion with race as a predictor )s ms required Gleary* s (19 d^fi 
nition, would often actelt or select only an extr me ly small perr 
centage of blackSj Darlington introduced his concept of the "cul- 
turally optimal" test* Under this concept , ainlsslona officers 
at a universltyi fo^ escmple, are asked to consider two potential 
graduating seniors, one white and the other black, and to Indicate 
how much higher the whlteVsGPA would have to be before the two 
candidates would be equally attractive. This number Is symbolized 
K and given a verbal label such as "racial adjustonent coefficient*" 
Then in determining the fairness of the te8t,= K Is first subtrac- 
ted from the actual criterion scores COPAVs) of each of the white 
subjects* If these altered, data satisfy Gleary's (1968). defini- 
tion of a fair test, the test la considered "culturally optimal." 

Figure 3 illustrates the geometrical meaning of Darlington* s^ 
altered criterion* If the admissions officer chooses a value of 
K which is equal to Y in Figure 3aV then the altered data wil^ 
appear as in Figure 3b, i.e., there will be a single commort re- 
gression line and the test as It stands will be "culturally opti- 
mal." If | however, an overzealous admissions officer chooses a 
value of K greater than Yj then the altered data will appear as 
in Figure 3c, i.e. , the test will be biased against blacks accord- 
ing to Cleary's (1968) definition and will thus not be "culturally 
optimal." Although Darlington Is willing to tamper with crite- 
rion scores,, he does not allow for application of this process to _ 
predictor., scores _ Thus _if ,the_s.ltua t ion _s^^ 

tains, it can be corrected only by Cl) modifying the facto;?- struc- 
ture of the test in such a way that the data move to the configu- 
ration in Figure 3b or (2) abandoning the test and seeking Vmother 
which meets the requirMents of Figure 3b* Similarly, should an 
uncooperative admissions officer select K< A.Y, and thus produce 
the situation, shown in Figure 3d5 the only remedies are changes 




3c. The alterad data wnen ; 3d. The altered data whan 



figwa 3. Darlington's (1971) method of al taring the data 
to define a "eulturally optlMl" test. 



21 



in the nature of the test or the introduction of an entirely new . 

test,-' . ' . 

What is the end result of these complicated procedures? From 
a statistical point of view, subtraeting a constant from the cri« 
terlon scores of whites is identical in its results to adding an 
equlvarent" conffta^^^ scores without changing the 

black prediction equation* Either procedure can be used to alter 
the prediction situation so as to create the impression of a sin- 
gle regression line when in fact group intercedes are different* 
If a single regression line is then used In practice^ race as a pre 
dlctor of performance is ruled out* Thus this definition of Darl- 
ington Vfl corresponds 5 perhaps ironically j to qualified individual- 
ism* • ■ ._ . 



II, ETHICAL POSITIONS, STATISTICAL DEFINITIONS ^ AND PROBLEMS 

Frax^ L* Sclmldt 
Personnel Research and Development Center 
UpS* Civil Service Commission 



In - this- sec clony--we br iefly^relate-each- ethical— position -to™ 
its appropriate statistical operatlonallzatlons 'and point out some 
of the advantages and disadvantages of each ethical approach* 

Unqualified Individual ism r Models and Problems 

The remedies advanced for tests unfair by Cleary's (1968) 
definition make clear that the Cleary approach is one of unquali- 
fied individualism* The recoimnended use of separate regression 
equations when these are not equal for all groups Is cletirly equiv- 
alent to the use of race as a predictor of performance* It is 
this requlrOTent that races sex, agej and other such predictors 
must be ^ployed when valid that would seem to create legal prob- 
lems for unqualified individualism* The 1964 Civil Rights Act, 
th^ l972 Equal Employment Opportunity Act| and other such legis- 
lation specifically forbid personnel decision-making based on these 
variables. Ironically, the Cleary (1968) approach to test fair- 
ness, with its required adjustaientSi is endorsedj recommended^ and 
^even required by the guidelines on employment testing published r 
by both the U.S. Equal toployment Opportunity Commission (1970) and 



22 



the Office of Federal Contract Compliance (1971)* These guide- 
lines requlra that ^ where valid, race be used as a predictor of 
Job success, even though the laws and eKecutive orders on which 
the guidelines are ostensibly based seOT to be clear in forbid- 
ding such use* There is no ready explanation for this apparent 
contradiction* 

CTiere Is another ethical toperatlve inhe*rent In unqualified 
individualism which gives rise to potential problOTS — in this 
case technical, rather than legal problems. Unqualified indi- 
vidualism requires that one apply to each candidate that predict 
tion procedure that Is most valid for him* Although It is tech- 
nically impossible to develop separate procedures for each in« 
dividual, It is often feasible to develop separate regression 
equations for different groups, and in these cases the ethics of 
unqualified individualism requires that this be done* This could 
theoretically lead to the Impossible task of constructing differ- 
ent prediction equations for all definable subpopulations- It 
is also important to note that each such equation must be maxi- 
mally valid, because low predictability within any group, rela- 
tive to other groups, leads to greatly reduced selection oppor- 
tunities for members of that group, at least at the selection ra- 
tios commonly in use(l*e*, SR ^.50)*. 

T^COTsiSr^nEoT^amp^ 
zero for one group* The predicted criterion score for everyone 
in that group is then the same i the mean criterion score for 
that group* Thus either everyone in that group Is accepted or 
everyone in the gro p is rejected* If that group Is In fact highly 
homogeneous on the criterion, then this Is perfectly reasonable* 
But If the zero-validity group has the same degree of spread 
the criterion as other groups, then this lack of discrimination 
poses ethical problens I either a great many poor prospects are 
being admitted, or a great many excellent prospects are being 
overlooked* 

Fortunately for the ethic of unqualified individualism, the 
research evidence strongly indicates that differential^ validity 
by race is no more than a chance phenomenon C Schmidt |Bernerj and 
Hunter, 1973)* The same may later be shown with respect to other 
population subgroups, thus greatly reducing the scope of this prob- 
lem. The problem would not thus be eliminated^ however; although 
the same tests may be valid across p&pulation subgroups, and re- 
gression slopes may be equal, there is much research evidence 



(Ruch, 19 72 J Stanley, 1971i Tmip, 1971| Cmipballj et al., 1973) 
that tntarcapts often differ significantly. Thus the task of test- 
ing and adjusting for intarcept differancas remains* =^ 

Quallfiad Indivldualismi todals and ProblMs 



Of the statistical models of test fairnass wa have raviawedj 
only Darlington's fifth daf initloni his "cultwally optimal" test, 
corrasponds aKactly to the ethic of qualified individualism. It 
should be added that, although this conclusion is certaini ; it is 
difficult to be certain in raading Darlington' s (1971) articla that 
it is I in fact, what he initially Intended, It should ba noted 
that this ethical position does not raqulra that yariables like 
race, saxj religion, ate* , not ba statistically valid pradl^tors 
of futura parformancaj but only that, should thaybaval id j they 
must not be used. Thus both tests that meat, and thqae that do 
not meat J Cleary's (1968) requirOTients are dafined as^^ " 
long as a single ragrassion is used, and, of course, the compli- 
cated analysis racomnended by Darlington (1971) to assure "cultural 
optimality** is not requirad in practica. This position saOTs to . 
come closest to that OTbodlad in the various civil rights laws | , 
and, as such, might ba expected to ancounter few legal hazards*: 
Howavar , as not ad- abover-EEOC and-OFGC-amploymant-testing 
Unas have apparently endorsed tha Cleary (1968) approach to test 
fairnass and thus have adopted a position of unqualif lad individu- 
alism. This apparent contradiction betwaan tha wording and in- 
tent of tha law and the Fadaral rules designad to enforce the law 
Is unexplained. It does, howavar , raisa the possibility - howavar 
remote that employars and institutions adopting a position of 
qualified individualism could hava charges of discrimination brought 
against th^. 

Advocates of quallfiad individualism must face another, more 
subtle but perhaps more real, problem. The ethical imperative 
hera requiras that tha prediction aquation that has maximiBn valid- 
ity for the ant ire population - without regard to group member- 
ship-be identified and employed. But there is a difficulty in 
doing this. Suppose, for" exarapla, that for a certain city col- 
lege the black ragrassion lina falls below the white regression 
line, i, a,, race is a valid predictor for that collega* Although 
usa of race as a predictor is, of course.,., forbidden Jto the quail- 
f led individualist, there may be altarnative ways of increasing 



29 



24 



the ovarall validity of the predtet ion equation that are equally 
objectionable. For example, if race is a valid predictor, then 
a properly coded version of the student's address day also be a 
valid predictor and increase overall validity. This indirect in- 
dicator of race would probably be detected and rejected, but a 
more subtle cue saight not be properly identified. The most sub- 
tle probl^ is the one facing the test constructor i if the black 
regression line falls below the white regression line, then the 
introduction of items whose content is biased against blacks would 
increase the overall validity of the test# If the separate re^ 
gression lines of the unqualified individualist are used, then 
racially biased test material would have no effect on the selec** 
tton of applicants. BW if--that—i%"€orbidden, then material bi^ 
ased agi^inst blacks would lower the black scores on the predic- 
tor and hence make their scores using the white regression line 
more accurate* That is, the, introduction of material biased against 
blacks would reduce the overprediction of black perforMnce and 
hence might raise the validity of a single regression line use of 
the test. 7 



The problen in its general forra^ theHj ts that any measured 
variable which correlates with race, seK> religion, etc* Ci«e* , 
shows group differences ) can be considered to be an indirect (and 
imperfect) indicator of group m^bership* Since he is forbidden 
to^uae^group^membership^itself "^as-a^predic 

the qualified Individualist may be tempted to substitute indirect 
indicators of group membership that may be "unfair*" How can he 
decide whether a given race^correlated predictor is "fair" or "un- 
fair"? This question can be answered, although all answers in- 
volve some element of J udgpient* Thm first criterion on which this 
jud^ent can be based is the apparent "intrinslcalriess" of the re- 
lationship between the predictor and performance* If the predic- 
tor is a job sample test (e.g*, a typing test) assessing the skills 
actually required on the job, there is little doubt that the re- 
lation is . intrinsic. Scores on a written achievment test could 
also easily 'pass this test, as would a face-valid aptitude test* 
Scores on a weighted biographical information inventory^ on the 
other hand, would be allowed only if they were able to meet the 
second, less subjective standardi . validity coefficients large^^ 
enough to be practlcally^^iignlf leant for both groups separately* 
A predictor which is valid onl y by virtue of its correlation with 
race will show no withln-'group yalidity- ;l . 

Thus the qualified Individualist' s answer to the question 
posed in the example above is that if the material to be added 



25 



to the test appaars to hava an intrinsic relation to perform* 
anee and if it ncreases both within-group validities, it ie 
ethically admissible* It is not "biased" against blacks as blacks 
but merely against applicants (of whatever race) who are less 
capable of performing well on the criterion* ITie fact that the 
proportion In the low skill group is greater for blacks than 
for whites is an ethically irrelevant fact* If It is ethically 
permissible to employ a test which shows racial differences, 
then it must follow that It la ethically permissible to improve 
the validity of the test within the rules described above* In 
fact, many qualified individualists would probably require only 
that the added material meet one of the two criterias If the 
material is Intrinsically related to Job per forsance, one need 
not deraonstrate significant wlthln*-grQup validities, and vice 
versa. . ; ■/. .■ . 



While the preceding distinctions are regarded as crucial ; 
among qualified individualists, they receive short shrift from 
those conmltted to other ethical positions* For one whose ethi* 
cal ipdsition Is that of population-based quotas , any predictor, . ^ 
or test mateirlai showing racial differences Is Ipso facto unfair* 
The correlation between test and race Is greater than zero and" ; 
its use will produce selection ratios different from population 
percentages. To the unqualified individualist ^an^^ _ 
shows a statistically reliable correlation with per formance""* Tm^ 
^gardless of within-group validities or content ™ls fair and there 
Is a positive ethical obligation to employ it. 

The Quota Ethics Models and Problems 

Only Darlington' s definition No. 4, which requires the com* 
piete absence of racial or ethnic differences Cl*e*, r^^^ 0) 

corresponds to the ethical position of population- based quotas* But 
quotas can be set on bases other than population percentages* 
Darlington's definition No* 3 offers one such basis, and Thorndike's 
(1971) model of test fairness (i.e., ^cx * ^GY^ sets eelectton_quota$, 

for popuratlbh subgroups based on past perforroance of each group as a 
whole. Thus, the individual Inclined toward a quota ethic may choose 
any of these definitions, depending on how far ha chooses to carry 
the quota- concapt| Thorndlke' s (1971) model represents the -smallest 
departure from the concept of Individual merit, and population-based 
quotas, the greatest. 8 ~ 



26 



As vlth unqualified Individualism, the ethic of quotas is 
pQtantially susceptible to lagal problems. It is based on the 
legally uncartaln proposition that ethnic and social groups as 
such, as well as individuals, have constitutional and legal rights* 
Our legal and goverimiental system, on the other hand. Is largely 
built around the Idea of Individual rights. If only Individuals 
have rights, then all quota-based systems. In varying degreei are 
unconstitutional^ iince they require that decisions on the basis 
of Individual qualities and qualifications must be sacrificed to 
the attainment of the proper group ratios—which, in turn, are_ 
based on the idea of group ^ rather than individual^ rights* (Ironl- 
callv, quota^baaed systems may be Illegal for the same fundamental 
reason that unqualified individualism is i both ethical system re- 
quire decisions to be made, at least to some extent, dlreptly on 
the basis of group membership.) A recent case which would per- 
haps have done much to clarify this issue was sidestepped by the 
U*S. Supreme Court (Defunlsvs* Odegaard) . But there are many 
such "reverse discrimination" suits pending in the courts, and 
the legal issue will almost certainly be addressed by the Supreme 
Court in the future. 

The second major problem characterizing quota^based ethical 
systems is that the criterion performance of selectees as a whole 
can-be^expected -to .be ^consider ably^lower than under jinqualified^ 



or even qualified individualism (Hunter, Schmidt, and Raushenberger, 
in press). In college selection, for exmiple, the poor-risk blacks 
who are admitted by a quota are much more likely to fall. Thus in 
situations where low criterion performance carries a considerable 
penalty* being selected on the basis of quotas is a mixed blessing. 
Second, there is the effect on the institution* The greater . tha 
divergence between the quotas and the selection percentages based 
on actual expected performance, the greater the difference in mean 
performance in those selected* If lowered performance is met by 
Increased rates of expulsion or firing, the quotas are undone and 
there is considerable anguish for those selected who did not suc- 
ceed. Furthermore, the public Image of the institution may suf- 
fer as a result of the high rate of expulsion* On the other hand, 
if the institution tries to adjust to the candidates selected, there 
may be great cost and inefficiency (Hunter , Schmidt , and Raushen- 
berger * in press)* In the case of academic institutions, quotas 
inevitably lower the average performance of graduates and hence 
the prestige rating of the school* Similar considerations apply 
in the case of the employment setting, but here the direct and Im- 
mediate impact on individual welfare is often greater. For example, 



27 



assuming valid salaction tests and other tnstrtmients, air traf* 
fic controLlars hired under any of the quota eye terns rather than 
under the Cleary modal would be mora likely to make the kinds of 
errors that can lead to air disasters* Truck drivers selected 
under a quota systadi would be mora likely to be involvad In acci- 
dents on the road* Thus in the employment setting differences 
between the various models of fairness often translate not only 
into economic loss but also into the most precious of all coiraiodi- 
tiea, human lives. 

A final, and less moniantousi consideration in the case of 
quota*based ethical systems concerns methods of salaction to be 
used within groups once group quotas have bean set* Host advo- 
cates of quota-based systms would probably involve individual- 
ism at this point, selecting those within each group with the 
highest predicted performance* ^ This resort to individualism within 
groups, rather than random selection, mitigates somewhat the nega- 
tive Impact of the quota system on selectee performance. It also 
raakas clear the underlying ethical assmptlons of this approachr 
CD ethnic Rroups per se have legal rights and; these rights over- 
ride those of individuals where there Is a conflict, and (2) the 
Individual's right to be considered on the basts of his qualifi- 
cations should be recognized when It does not conflict with group 
rights Cl*e., within ethnic groups)* 

Ethical Systems^ Statistical ModelSj 
Individual Merit, and Social Goals 

The ethical systems and statistical models of. decision fair- 
ness reviewed in this paper may be scaled along a dimension that 
might be called "emphasis on individual merit*** The systems and 
models at the high end of this continutm are based on the assump* 
tlon that the right of the individual to be considered on the basis 
of his qualifications and expected parforraance is par mount* Those 
at the low end assume that the rights of groups, and social goals 
and considerations in general, take precedence over individual 
rights whenever there is a conflict. The ordering of the models 
and systems along this continuum is as follows i (1) Cleary 's (1968) 
approach, corresponding to unqualified Indivldualismi (2) Darlington 
(1971) '^culturally optimal" test (t*e*, his fifth definitlon)v cor- 
responding to qualified indivlduallami (3) Thorndlke's (1971) limi- 
ted quota model I (4) the more extrme quota modal represented by 



28 



Darlington's (1971) definition No. 3 and CoXe's (1973) definitionj 
and (5) Darlington's (1971) definition No»4s corresponding to a 
population-based quota system. Selection tests currently used in 
ampioyment and education tend to fall somewhare between the Cleary 
and Iliorndike models (Sctaldt anc^ Hunters 1974| Linn, 1973| Cmpbellv 
et al*^ 1973), that is. In the general region of qualified indi- 
vidualism. Unqualified individualists must conclude that tests 
are often slightly biased against the majority group , while to Thorn- 
dikeana they are somewhat unfair to the minority group. Those who 
adhere to Darlington's definition No. 3 or to the ethic of popula- 
tion-based quotas must feel that current tests are markedly unfair 
to minority groups. The qualified individualists of course, con- 
cludes that most currently used tests are probably reasonably close 
to being fair to all groups. 

Which of these definitions will ultimately prove most accept- 
able legally, socially, and ethically - to the American people? 
The answer is not yet knowi or knowable, but it is certain to de- 
pend on at least three important factors i (1) the strength of 
the coflsnitment of the general public to the idea of individual 
merlti (2) public support of the national conmltment to increased 
minority income, educational, and occupational levels, and (3) 
perhaps most importantly, the coming court rulings on the deli- 
cate issue of individual rights versus group rights and social 
goals* Under these circumstances, and given the -inherent sub J 
tivtty of decisions in this area, it would be highly inappropriate 
for us to urge any one of these ethical positions or statistical 
models on psychology as a whole. But it is our hope that, by sk- 
plicating the important differences Mong the various options, 
this paper will contribute to the making of informed, intelligent 
decisions. 



29 



This appendix eontalns the mathematical oilculatlon of the 
eKpeetadachieyement level of the group that would be selected 
by the full application of Thorndike's criterion, i.e., a group 
aelected so that for each test score the nmber of people se-- 
lected la proportional ' to the probability that persona at that _ 
test level would in fact be "succeaaful". The definition of "sue 
cesaful" used below is "performance above average on the crlte-- 
rion"* That is, the calculatlona done below assume a base rate 
of ,50* The selection ratio asstaaed is also .50. 



For almplicity, both test and perforaance have been asaimed 
to be measured in atandard scores. The s^bol is the stand-^ 

ard normal density function and the aymbol ^(k) Is the standard 
normal curaiulatlvediatributlon function* The symbol A will be 
uaed for ''accepted" (or selected for admlaalon^ 



If the criterion of success is the top SO percent, then In 
terts of atandard acorea, the success criterion Is— Y= >-0, Thus 
the conditional probability of being aiS^clpted^ls 



PCA|X) - 



F{Y>0 X} 





l-r 



^ ( 



r 



l-r 



If the number selected at each test score Is P(A|x) , then the over 



all selection ratio will be 



2 
l-r 



35 



30 



The distribution of the test scora aiaong thosa selected Is 

Since X and Y are in standard score form, the regreseion of Y on 
X is given by , . 

E(y1x) ^ rx 

Thus the maan criterion score among those aeleetad will be 

E(Y) ^ E(E{y|x}) 

^Ak £ (k) dx 
J A 



/rx — X ) *(k) dx 

.. .. - .. 1-r 



This is not an easy integral to calculate, and the calculation r 
balow will thue be broken into five steps. First to simplify 
the algebra, we will Introduca the parametar a by the definition 



a ^ r 



In particular, if r — .6 then 
a ^ -6 ^ 3 

The formula for mean criterion performance among those selected 
can then be written 

E(Y) ^yar^Cax) * (x) dx 

Step 1 First we apply the method of Integration by. parts: 

^ } dx 

— > $(ax) u(x) ^J*u(:^ GL ) dx 



32 



Step 4 Thus, we can finally calculate the main Integrals 



J ^ 



Since a was defined to be 



we have 



1 + a2 1 + H 



1-r^ 



Step 5 Finally we can use the malti iu;;«2rsi to calculate the ax- 
_pected achieyement ,le^ _ ; , , 



E<Y) « 2r Integra \ " 2r 1 



a 2r^ 



'2Tf 



For r^ ,6, this formula yields ECy) - .288 



38 




FOOTNOTES 



33 



!• This phenomenon would aeeount far perhaps half of the magnitude 
of overpredlGtion of blaekeollege grade point average found In the 
literature* In standard seore units, the difference In intercepts 
due to unreliability is m (l^r^p^) (jq^-VB^ '^here r^Q^ Is the test 
reliability and " Is the white - blaok mean difference on the 
. criterion (about, one. S*D*)* . .i'o 

whereas in the data reported In Linn (1973) , the overpredlctlon Is 
about ,37 S.B. . ' ^. 

2» The reader may wonder why we show so much concern with the rali- 
abillty of the test aiid no concern with the reliability of the cri- 
terion* ActU£illy despite Its large effect on the validity coefficient j 
no amount of unreliability in the criterion has any effect on the re-* 
gression line of criterion on predictor, Let the true score equations 
for X and Y be X ■ T + ej and Y « U + e2 and let the regression true 
score equation be aT^g, Then the observed regression line will 
not have the same coef f Icients* Let the observed regression line be 
; Y =. aX + b. The slope of the observed regression line will be 



That Isj the slope of the observed regression line is the slope of the 
true score regression line multiplied by the reliability of X* How- 
ever, note that the slope of the observed regression line Is con^letely 
Independent of the reliability of Y, The Intercept of the observed 
regression line is given by i 

b^ yy- ay^^^ py- QV^^ Vu" ^XX^ ^T 

Thus the Intercept is also affected by the reliability of X, but Is 
completely independent of the reliability of Y* If we have equal slopes 
on the true score regression equations and equal within- groups test 
reliability^ any differences in the regression lines will be equal to 



* ^ ^XY ^ ^ ^^TU ^TX ^UY^ ^ ^ ^TU 





= a r 



39 



34 



the differance between the intercapts and hence independents of xyY- 
In the ease where the true score rogression lines are the bbmb^ the 
difference between the observed regression lines Is 

; by - bB ^ C 1 « ^) (yuw - PuB^ 

3. ^mtle on the topic of reliability, we should note that as the re- 
liability approaches \Q0^ the test becomes a random selection device 
and Is hence utterly reprehensible to an individualist of either stripe. 
On the other hand a totally unreliable test would select blacks In 
proportion to population quotas. Iroriically, the argument that tests 
are biased against blacks because they are unreliable Is not only falser 
it Is exactly opposite to the truth* 

4. What Thorndike has rediscovered has long been known to biologists i 
Bayes' law is cruel. For example^ if one of two equally reproductlva 
species has a probability of ,49 for survival to reproduce and the 
other species Is *50, then ultimately the first species will be ^tinct* 
Maximization In probabilistic situations is usually much more extreme 
than most individuals expect (Edwards and Phillips, 1964). 

5. Since the groups have equal standard deviations on both predictor 
and criterion, assume for algebraic simplicity/^ the variables have 
been scaled so that all within group standard deviations are unity - 

^Thls^means^that-deviatlon-scores-are-standard^ score 
selectiorr ratio for whites has beeg daterminad* Then there Is a cgrre^ 
spondlng standard score on Y say Y such that the standard score Y - Ypj 
would cut off that percentage of whites^ To select that same percentage 
of whites, there is a predictor score on the test, %, such that 

xjj ^ % - r - % 

If the multiple regression equation is 

Y = axV+ SC + Y 
then the multiple regression cutoff score is 

Y*^ctJ^ + S + Y 



ifSyl'jS^lljli^^ always matches the group means perfectly,^ 

fiS'? . 7 " and hence 

> determined by 

J':'":: ... =ot (Xg - Xg -I- Xg) H- Y 

= a (Xq - Xb) + Xg -f -y 
Since multiple regression matches means 
=a X3 + Y ' 

and hanca tha black predictor cutoff satisfies 

- - a Cx| ^ Xb) + Yb' 

Thorndika* s quota for blacks is obtainad if tha standard scora for the 
predictor cutoff is tha same as tha standard score for the critarion 
cutoff , i.e», if ". . ■ . . ■■ , : ■.- ^ •;- ■; '/ .. .r 

XB - Xb - Y ^ Yb _ ■ V - ^ 

Now wa hava in general 

a CxJ - xb) - y"^ - Yb 

Thus Thorndika ■ s quotas obtain only if - 

a Cy"^ - Yb) - Y* 7b 

= {ct CXy - Xpj; + Yy - Yb 

, = ct(Y* - ^) + ^ - Yb " ^ 4. 1^:^ ; 

This is only true only if 

aC7„ - Yb) = y„ - 7b 



^sThus^ Thorndike' © quotas are obtained only if one of two things is 
true^; ei_ther r or both sides are zeroj i* a*, either ^1.00 
or - Yg 0. 

Since the variables were all scaled to have equal within group . 
standard deviations I the regression weight is in fact the within 
group predictor-criterion correlation. Thus ^ 1 means that the 
test has perfect validity*. 

u The equation Y^ * Yb = 0 is equivalent to % ^ Yb? i*e.| no 
group difference on the criterion and hence xq^^ 0. 

6a Darlington^8 error was a subtle one* He assumed^ that x^qy^g^ ^ 
when in fact rQY»Q - which is undefined. 

7# This argument j of course, assumes that the newly added biased 
material has no detrimental effect on the within-group validities. 
We return to this consideration later, 

8. . Degree; > of departure from the concept of individual merit is directly 
related to loss of selection utility occasioned by use of the fairness 
model (Hunter / Schmidt I and Rausheuberger ^ 'in press )*^^ >^ 

9. On March 11 5 1975 V Federal Judge Spencer Williams | Unites States ^^.^ 
District Court, Northern District of California ruled in Cortez vs.- 
Rosen that the Cleary model Is the **only one which \is historically , 
legally, and logically required"* This ruling which sustained the use 
of a police exmination shown to meet Gleary model requlrOTents, is the 
first to address the question of the relative -legal merits of alterna- . 
tlve fairness models. -. : 



mmm 

nfiil'S^r* >m' ; 1''. ' ■ .' V' 



37 



Cleary , T*A* X©8t blasi ; predletion of grades of Negro and white : = " 
students in integrated colleg Journal -of Educatlonal- -^- — --^^^ 
Measurement , 1968, 5, 115-124. 
Cole, N*S*/ Bias In seleetlon. Iowa, Gitys Iowa r The ^ 

College Testing Program^ 1972. ^ ^ (Also In Journal of Educa- 
tional Measurement , 1973,: lOV: 237-255. ) 7 
Campbell J.T* , Crooks j L^ A. i^ lte 

investigation: of ' sburceB^of bias^ i^ 
-^ performance;— a°8lx year ^stud^ PR- 73-3 7$ 

Educational Testing Service, P^iribet ! 
Darlington, RsBs-H Another,; 1 Journal 

of Educational Measurementy 19 71\ 8^ 71-82. 
Supreme Court of the United States* vPefunis v. ^Odegaard decision. 

WashingtonvD.C.r Author, March 1974. • 
T : Edwards, W# ^ arid v Phillips, ; I1.D. Manias rtransducerffor probabili- 
ties in Bayesian command and control ays tOTS* In Mi W* Shell II 
and G^^L . Bryan (eds ) Human Jud^ents and Op t toality . New York: 

: ; :.iwiiey, 1964.- / [~^~ " ^ ~'r ; r 

Hunter 3 - J.E. , Sclmidt ^ : F.L. , and Raushenberger, : J^ - Fairness of 
^ : - psychological tests r . uttplications of three ^^^^^^ 

selection utility and minority luring^ »^urnal of ^plied r " 
Psychology , in'press." --- - v,"^ : .-v"'^' ' ^^^^^^ 

Linn^ R,L# Fair test use in selection* Review of Educational v 7 7 

Research . 1973, 43. 139-161* , \ C\ / y 

Linn J R - L* and Warts , G *E * Gorisideratiohs for s tudies of ; tes t : 7 7 
7 bias. Journal j^^^ijEducatipnal Measurement y 19 70 y 7y 1-4. 
: ^ Ruch, W-W. A re-analysis of published differential validity stud*- 
7: ies. Paper presented at the Symposium, Dif f erential -Valida- 

; tion Under EEOG and : OFCC Testing : and ' Selection Regulations^ : 

- Schmidtj F.L#^^ Berner ^ J*G* , ; and^Hunter, J. E, Racial differences 
. - in validity^ of ^ 

' Journal -of Applied Psychology , 1973, 58. 5-9. 7 : ; 
Sctoidt, F*L., and Hunter, J*E. 7 Racial and ethnic bias In^ psy- 
chological (.tests % Divergent iinplications of two definitions 
of test bias* ' American Psychologist, :^1974v^29, 1-8* v^ ^' r^ 
. Stanle/,J*C. Predicting: college success of the education 

vantaged. Science , ^rch 19, .1971 ^171v 646-647* :-^C---^-y-'^-.-. 
Temp, G* Validity of the SAT for blacks and whites in thirteen - 
integrated institutions* Journal of Educational Measurement , 
1971, 8, 245-251* . 



43 



ERIC 



38 



Thorndike, R.L- Concepts of culture-fairness. Journal of Edu- 

catlonal Meaaurepent , 1971, 8, 
U-S, Equal Qnployment Opportunity CoTmnission. Guidelines on ein- 

ployment salaction procedures. Washington, D*C,: Author, 

1970. " r_ " 

U.ST'Of flee of Federal Contract Compliance. Regulations on Em^ 
• ployee Tasting and Other Selection Froceduresr ^ U,S -Depart- 
' ment of Labor* Washington, D»C. I Author, 



44 




EKLC 



